Digitizing a Million Books: Challenges for Document Analysis
Identifieur interne : 001088 ( Main/Exploration ); précédent : 001087; suivant : 001089Digitizing a Million Books: Challenges for Document Analysis
Auteurs : Pramod Sankar [Inde] ; Vamshi Ambati [États-Unis] ; Lakshmi Pratha [Inde] ; V. Jawahar [Inde]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2006.
Abstract
Abstract: This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories. The challenges are identified from the experience of the on-going activities toward digitizing and archiving one million books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient image processing algorithms. However, much more research is needed to address the challenges arising out of the diversity of the content in digital libraries.
Url:
DOI: 10.1007/11669487_38
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000213
- to stream Istex, to step Curation: 000210
- to stream Istex, to step Checkpoint: 000A61
- to stream Main, to step Merge: 001105
- to stream Main, to step Curation: 001088
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Digitizing a Million Books: Challenges for Document Analysis</title>
<author><name sortKey="Sankar, Pramod" sort="Sankar, Pramod" uniqKey="Sankar P" first="Pramod" last="Sankar">Pramod Sankar</name>
</author>
<author><name sortKey="Ambati, Vamshi" sort="Ambati, Vamshi" uniqKey="Ambati V" first="Vamshi" last="Ambati">Vamshi Ambati</name>
</author>
<author><name sortKey="Pratha, Lakshmi" sort="Pratha, Lakshmi" uniqKey="Pratha L" first="Lakshmi" last="Pratha">Lakshmi Pratha</name>
</author>
<author><name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E96E767CE48405122392E7508C98969E20DA18DE</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/11669487_38</idno>
<idno type="url">https://api.istex.fr/document/E96E767CE48405122392E7508C98969E20DA18DE/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000213</idno>
<idno type="wicri:Area/Istex/Curation">000210</idno>
<idno type="wicri:Area/Istex/Checkpoint">000A61</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Sankar P:digitizing:a:million</idno>
<idno type="wicri:Area/Main/Merge">001105</idno>
<idno type="wicri:Area/Main/Curation">001088</idno>
<idno type="wicri:Area/Main/Exploration">001088</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Digitizing a Million Books: Challenges for Document Analysis</title>
<author><name sortKey="Sankar, Pramod" sort="Sankar, Pramod" uniqKey="Sankar P" first="Pramod" last="Sankar">Pramod Sankar</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Regional Mega Scanning Centre, International Institute of Information Technology, Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Ambati, Vamshi" sort="Ambati, Vamshi" uniqKey="Ambati V" first="Vamshi" last="Ambati">Vamshi Ambati</name>
<affiliation wicri:level="4"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Institute for Software Research International, Carnegie Mellon University</wicri:regionArea>
<placeName><settlement type="city">Pittsburgh</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Pratha, Lakshmi" sort="Pratha, Lakshmi" uniqKey="Pratha L" first="Lakshmi" last="Pratha">Lakshmi Pratha</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Regional Mega Scanning Centre, International Institute of Information Technology, Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Regional Mega Scanning Centre, International Institute of Information Technology, Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Inde</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">E96E767CE48405122392E7508C98969E20DA18DE</idno>
<idno type="DOI">10.1007/11669487_38</idno>
<idno type="ChapterID">38</idno>
<idno type="ChapterID">Chap38</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories. The challenges are identified from the experience of the on-going activities toward digitizing and archiving one million books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient image processing algorithms. However, much more research is needed to address the challenges arising out of the diversity of the content in digital libraries.</div>
</front>
</TEI>
<affiliations><list><country><li>Inde</li>
<li>États-Unis</li>
</country>
<region><li>Pennsylvanie</li>
</region>
<settlement><li>Pittsburgh</li>
</settlement>
<orgName><li>Université Carnegie-Mellon</li>
</orgName>
</list>
<tree><country name="Inde"><noRegion><name sortKey="Sankar, Pramod" sort="Sankar, Pramod" uniqKey="Sankar P" first="Pramod" last="Sankar">Pramod Sankar</name>
</noRegion>
<name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
<name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
<name sortKey="Pratha, Lakshmi" sort="Pratha, Lakshmi" uniqKey="Pratha L" first="Lakshmi" last="Pratha">Lakshmi Pratha</name>
</country>
<country name="États-Unis"><region name="Pennsylvanie"><name sortKey="Ambati, Vamshi" sort="Ambati, Vamshi" uniqKey="Ambati V" first="Vamshi" last="Ambati">Vamshi Ambati</name>
</region>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001088 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001088 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:E96E767CE48405122392E7508C98969E20DA18DE |texte= Digitizing a Million Books: Challenges for Document Analysis }}
This area was generated with Dilib version V0.6.32. |